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IMPRO VEMENTS IN AND RELATING TO THE ANALYSIS OF DN A 

This invention concerns improvements in and eclating to the analysis of DN A, 
particularly, but not exclusively, in relation to the analysis of DNA for use in forensic 
science More particularly the invention concerns tbe provision of information on the 

5 chance of contamination and other developments which use that information. 

There are a variety of ways in which contaminants can become incorporated in a 
DNA sample and hence figure in the results. Some such contaminants sore address or at 
least a warning is provided in existing analysis systems. However, such systems do not 
provide any account for sporadic contamination. The invention has amongst its aims to 

10 account thoroughly for such sporadic and undetected contamination; to provide a clear 
indication as to the potential level of error that may arise from such contamination; to 
provide guidance as to the threshold at which alternative analysis techniques or protocols 
should be used; to provide improved methods of operating DNA databases, particularly in 
terms of additional data which accompanies the DNA profile results reported by 

IS organisations to the DNA database operator; and, in particular, to provide a method of 
estimating the number of false positives for a DNA analysis unit and their associated 
likelihood ratios. 



According to a first aspect of the invention we provide a method of providing 
information on DNA samples, the method including:- 
20 in respect of one or more negative controls, obtaining information on whether or 

not DNA is suggested as present in the negative controls; 

determining die probability of DNA being suggested as present in the negative 
controls, the determination being based on the number of the negative controls which 
suggest DNA is present compared with the total number of negative controls considered; 
25 the probability of DNA being indicated as present in the negative control being 

equated to the probability of the DNA samples being contaminated. 

The information cm contamination may relate to sporadic contamination and/or 
undetected contamination* 
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The contamination may be due to perrons involved in the collection and/or 
handling and/or analysis of the sample. The contwirfnation may be due to the reagents 
involved in the collection and/or handling and/or analysis of the sample. The 
contamination may be due to the equipment involved in the collection and/or handling 
5 and/or analysis of die sample. 

The method may provide information on contamination for the whole of a part of 
the process between collection of the sample and reporting of the sample. Information may 
be provided on the overall probability of contamination. Information may be provided on 
the probability of contamination arising from one or more stages of the overall process. A 
10 stage may be the crime scene stage for a sample, for instance between the point before the 
sample is reached to the point at which the sample is dispatched by the person collecting 
the sample. A stage may be the evidence recovery unit stage, for instance between the 
point at which a sample is received at an evidence handling unit and the point at which it is 
dispatched to an analysis stage. A stage maybe an analysis stage, for instance between the 
15 point of receipt by the analysis stage and the completion of the DNA profiling of the 
sample. The analysis stage may be performed by a DNA analysis unit 

The method may provide information on contamination due to one or more 
elements of the process. An element of die process may extend through one or more of the 
process stages or may be a feature of a single stage. An element may be the staff involved 
20 in the process. This embodiment of the invention may particularly involve the features, 

options or possibilities set out below in relation to the fifth aspect of the invention, 

incorporated herein by reference. ~ ~" ~ - 

Preferably at least 20 v more preferably at least 50 and Ideally at least 100 negative 
controls are used. 

25 Preferably the negative controls provide information on the contamination of 

samples passing through a DNA analysis process above and beyond information on the 
batch of samples with which the negative control is analysed. 

The one or more negative controls may be used to provide information cm the 
contamination for all or part of the process. Preferably the one or more negative controls 

30 pass through the stages of die process they are to provide information on the contamination 
for. Preferably the one or more negative controls pass through the stages of the process 
they are to provide information for in an equivalent manner to die samples as they pass. 



pill 
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through those stages of the process. Preferably the manner of collection for a negative 
control is equivalent to the manner of collection for a sample, but without involving the 
sample. Preferably the manner of handling of a negative control is equivalent to the 
manner of handling of a sample. Preferably die manner of analysis of a negative control is 
5 equivalent to the manner of analysis of a sample. Preferably the negative controls pass 
through a crime scene stage in an equivalent manner to samples. Preferably the negative 
controls pass through an evidence recovery stage in an equivalent manner to samples. 
Preferably the negative controls pass through an analysis stage in an equivalent maimer to 
the samples. 

lo The one or more negative controls may be used to provide information on the 

contamination for all elements or an element of the process. Preferably the one or more 
negative controls interact with the clement or elements pass of the process they are to 
provide information on the contamination for. Preferably the one or more negative 
controls interact with the clement or elements of the process they are to provide 
IS information for in an equivalent manner to the samples. Preferably the manner of 

interaction with the persons involved in one or all stages of the process is the same for a 
negative control as it is for a sample. Preferably the manner of interaction of a negative 
control with reagents is the same as it is for a sample. Preferably the manner of interaction 
of a negative control with equipment is the same as it is for a sample. 
20 Preferably the information on the contamination of DNA samples is provided in 

respect of a period of time. The period of time may be a fixed period, for instance a month. 
After the elapse of the time period the method may be repeated. The repeat of the method 
may provide revised information. 

A proportion or* more preferably, all of the negative control samples occurring 
25 during a period of time for which information on the contamination is required may be 

used. Where only a portion of the negative controls are used, preferably these are selected 
at random. The negative controls used preferably include any for which no DNA is 
suggested as present 

Preferably the negative controls considered and the samples considered are in 
30 respect of the same time period. 

The information on the contamination of DNA samples may provided in respect 
of on© or more elements. The elements maybe one or more of the people involved with 
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the samples and/or negative controls, Che reagents involved with the samples and/or 
negative controls* the equipment involved with the samples and/or negative controls. After 
a change in one or more of the elements, for instance a change in equipment supplier, the 
method may be repeated. The repeat of die method may provide revised Information. 
5 A proportion or, more preferably, all of the negative control samples occurring 

during a set of elements for which information on the contamination is required may be 
used. Where only a portion of the negative controls we used, preferably these are selected 
at random. The negative controls used preferably include any for which no DNA is 
suggested as present. 

10 Preferably die negative controls considered and the samples considered are in 

respect of the same set of elements. 

The information on whether or not DNA is suggested as present in a negative 
control may include allele position and/or allele length and/or peak area and/or peak height 
Preferably DNA is suggested as present where an indication is present which exceeds one 
15 or more criteria. Preferably DNA is suggested as not present where an indication is not 
present or any indication present does not exceed one or more criteria. Preferably 
equivalent consideration is given to the negative controls as to die samples. Preferably the 
same characteristics and/or same criteria an used. 

Preferably fee determination of the probability of DNA being indicated as present 
20 in die negative control is the number of negative controls which suggest DNA is present 

,,, divided by the total number of negative controls. 

Withrespec^^ 

probability of a sample suggesting DNA is present in the sample, but that DNA arises ftom 
contamination only. This determination may involve, in respect of one or more samples, 
25 obtaining information on whether or not DNA is suggested as present in the sample. The 
number of samples not suggesting DNA is present compared wife fee total number of 
samples considered may be assumed to determine the probability of a sample not 
suggesting DNA as present The probability of a sample not suggesting DNA as present 
may be used together with the probability of a negative control being contaminated. The 
30 probability of a sample not suggesting DNA as present may be multiplied by the 

probability of a negative control being contaminated to give the probability of a sample 
suggesting DNA is present in the sample, but that DNA arises from contamination only. 
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With respect to a sample, the method may flutter provide a determination of the 
probability of a sample suggesting DNA is present in the sample* the DNA arising from the 
sample and from contamination. This determination may involve, in respect of one or 
more samples, obtaining information on whether or not DNA is suggested as present in the 

5 sample. The number of samples not suggesting DNA is present compared with the total 
number of samples considered may be assumed to determine die probability of a sample 
not suggesting DNA as present The probability of a sample not suggesting SNA as 
present may be used together with die probability of a negative control being contaminated. 
The probability of a sample not suggesting DNA as present may be multiplied by the 

10 probability of a negative control being contaminated to give the probability of a sample 
suggesting DNA is present in the sample, but that DNA arises from contamination only. 
The determination may further involve subtracting the probability of a sample suggesting 
DNA is present, but that DNA arises from contamination only, from the probability of a 
negative control being contaminated. 

is The method may be applied to one or more groups of samples and/or negative 

controls. A group may be the samples and negative controls from one operating 
organisation. A group may be the samples and negative controls from one processing lino 
of an operating organisation. The information from the method may be provided to one or 
mote of the subsequent users of the DNA profile or results underlying it, together with the 

20 DNA profile or results underlying it One such subsequent user may be the provider of a 
DNA database, for instance a d at a b ase of profiles from known persons and/or from known 
items or locations and/or from unknown persons. The provider of a DNA database may 
require the provision of information of the method. Further options and possibilities for 
tins embodiment of the invention are set out in the fourth aspect of die invention below, 

25 and are incorporated herein by reference. 

The information may be used to assist in defining the formal; for instance 
protocol, followed in a DNA sample analysis process. The DNA sample analysis process 
may use two or more protocols dependent on one or mom variables. The or one of the 
variables may be the peak height and/or peak area detected for a sample and/or negative 

30 control. For instance, where a peak height and/or peak area is detected above a threshold a 
first protocol may be used in the analysis. For instance, where a peak height and/or peak 
area is detected at or below a threshold a second protocol may be used. The second 



16/07 2003 14:16 FAX 0113 243 0448 UOL LEEDS ♦ OOCUMENTREC 0010/03 




-6- 



protocol may be a low copy number protocol. The second protocol may include analysis of 
at least duplicate samples of the sample in question. The second protocol may discard the 
results of die analysis where the first and second analyses of the same sample produce 
results which axe outside of a defined level of similarity. The present threshold for 
3 different protocols to be applied is 50 random fluorescence units, SOrfu. 

The information of the present method may be used to determine whether or not 
die threshold is set at an appropriate teveL The present invention may be used to determine 
the appropriate level for the threshold. Further options, features and possibilities for this 
embodiment of the invention are set out below in die second and/or third aspects of the 
10 invention and they are incorporated herein by reference. 

When obtaining information on the negative controls and/or samples peak height 
and/or peak area and/or allele length and/or allele number may be obtained. In respect of a 
stage of die process and/or the overall process and/or an element of the process and/or all 
elements of the process the frequency of occurrence of particular peak heights and/or peak 
1 5 areas may be considered, for instance in relation to small ranges which cover die spread of 
peak height and/or peak areas encountered. The sum of the peak heights and/Or areas may 
be so considered. The proportion of negative controls which generate peak heights and/or 
peak areas above the threshold level may be considered. This consideration may provide 
infonnation as to the level of potential problem contamination for that threshold. This 
20 process may be repeated for the threshold and/or one or more revised thresholds. This 

process may be used to suggest a revised threshold for use in determining which protocol 
~ to -follow- The rBvBgaTteghgMTnas^^ hft ^Rft flr thgnjfae 

threshold. The threshold may be defined in trans of a random fluorescence unit value. 

The negative controls may be ranked according to the sum of their peak heights, 
25 from highest to lowest The samples may be ranked according to the sum of their peak 
heights, from lowest to highest 

The method may include simulating sample DNA and contamination DNA in 
combination. A plurality of such combinations may he simulated. Preferably each 
simulated mixture possible, formed of one negative control from amongst the one or more 
30 negative controls and one sample from amongst the one or more samples, is simulated 
preferably negative controls and/or samples which suggest no DNA present are included 
amongst the possibilities from which die pairs are generated. Simulations with DNA due 
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only to the sample contribution and/or with DNA due only to the negative control and/or 
doe to both the sample and negative control and/or with no DNA present may be obtained. 
Preferably the proportions, for instance percentages, of each type of simulation are defined 
by tbe probabilities of occurrence of the positions defining those types. 

S Preferably the simulation includes information on tbe quantity of DNA present in 

the simulated mixture due to the sample and due to the negative control. Preferably the 
simulation includes information on the peak area and/or peak height for DNA present in 
the simulated mixture due to the sample and due to the negative control. 

Where die proportions of the simulations which are negative control DNA only 

10 and/or negative control and sample DNA together are above a proportion threshold then the 
method may include further consideration of at least one or more of those simulation types. 
The same proportion threshold may be used for each type or different proportion thresholds 
maybe used. 

Particularly in respect of simulations which are a mixture of sample DNA and 
is contaminant DNA, but potentially in respect of one or more of the other type* too, the 
following farther features of the method maybe used. Preferably for one or more of tbe 
simu l ations, potentially all of the simulations, the mixture proportion from the sample and 
negative control is determined. The mixture proportion may be defined as the sum of the 
peak height from the negative control divided by the sum of die peak height from the 
20 sample. The mixture proportion maybe defined as xhe sum of the peak area from the 

negative control divided by the sum of the peak area from the sample. The proportion of 
simulations with a mixture proportion relative to one or more specified levels maybe 
established. The proportion may be those simulations with a negative control contribution 
>1, The information on mixture proportion may be used to indicate the proportion of cases 
25 in which die contamination is the greater part of the mixture and/or is above a level of 
concern. 

Particularly in respect of simulations which are contaminant DNA only, but 
potentially in respect of one or more of the other types too, the following further features of 
the method maybe used. Likelihood ratios maybe calculated for the simulations. The one 
30 or more likelihood ratios determined may be the ratio of probabilities where the numerator 
is the probability of the evidence in tbe result/DNA profile originating from the suspect and 
the denominator is the probability of the evidence in the result/DNA profile originating 
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fromatandom unknown person. Preferably likelihood ratios ore only calculated for those 
simulations in respect of which the peak height and/or peak area, preferably in summed 
form, is above me threshold applying- Preferably the frequency of occurrence at one ox 
more likelihood ratio levels is calculated with the threshold applying. The threshold 
applying may be varied to alter the frequency with which a given likelihood ratio occurs. 
The threshold may be raised to decrease the frequency with which a likelihood ratio occurs 
and/or to increase the likelihood ratio which occurs with a given frequency. The threshold 
may be lowered to increase the frequency with which a likelihood ratio occurs and/or to 
decrease the likelihood ratio which occurs with a given ftequency. The thresholds for 
different operating organisations and/or different processing lines may be adjusted to 
balance ftequency of likelihood ratios between them. 

The above method may be used alongside other contamination prevention and/or 
detection steps, such as me use of elimination databases which contain profiles of staff who 
could contact the samples and/or negative controls. 

The first aspect of the invention may include any of the features, options or 
possibilities set out elsewhere in this document 



According to a second aspect of the invention we provide a method of providing 
information on possible errors in a method of analysis, the method of analysis including a 
threshold which determines me analysis protocol to be appU*^ 
20 method including:- 

in respect of one or more negative controls, ob taining-iiifermation^whe th e r rw 

not DNA is suggested as present in the negative controls; 

determining the probability of DNA being suggested as present in the negative 
controls, the deterrnination being based on the number of the negative controls which 
suggest DNA is present compared with the total number of negative controls considered; 

die probability of DNA being indicated as present in the negative control being 
equated to the probability of the DNA samples being contarriinated; 

in respect of one or more DNA samples, obtaining information on whether or not 
DNA is suggested as present in the DNA sample; 

obtaining information about the quantity of DNA in a DNA sample or negative 

control; 



30 
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eomparing the quantity of DNA in a negative control sample with the threshold to 
establish the number or proportion of negative controls on one or other side of the 
threshold. 

ha this way an indication is provided as to the number of potential felse positives 
which could occur as sufficient contaminant DNA could be present in those cases to give a 
reportable result. 

The method may include adjusting the level of the threshold to alter the number or 
proportion of negative controls on one or other side of the threshold* The method may 
include adjusting die level of 4c threshold to reduce the number or proportion of negative 
controls above the threshold, hi this way die number of potential felse positives may also 
he addressed. 

According to a third aspect of the invention, a method of providing information 
on, in a method of analysis, the likelihood of a result arising due to contamination, the 
mettiod of analysts including a threshold which determines die analysis protocol to be 
applied to the analysis of DNA, die method of providing information including: - 

in respect of one or more negative controls, obtaining information on whether or 
not DNA is suggested as present in the negative controls; 

determining the probability of DNA being suggested as present in die negative 
controls, the determination being based on the number of die negative controls which 
suggest DNA is present compared with the total number of negative controls considered; 

the probability of DNA being indicated as present in die negative control being 
equated to the probability of die DNA samples being contaminated; 

in respect of one or more DNA samples, obtaining information on whether Or not 
DNA is suggested as present in the DNA sample; 

obtaining information about the quantity of DNA in a DNA sample or negative 

control; 

simulating one or more mixtures, the mixtures each being formed from a pairing 
of a negative control sample and a DNA sample from amongst the one or more negative 
controls and the one or more DNA samples; 





10 
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ostabliahing the mixture proportion for one or both of the following types of 
simulated mixture: DNA from wmtamination only, DMA from bom DNA sample and 
contamination; 

determining a likelihood ratio in respect of a result arising for one or both of the 
types of simulated mixture. 

In this way information quantifying the risk of a felse positive is provided. 
The following features, options and possibilities may apply to any of the forms of 
the Invention set out in this document, but are particularly applicable to die second and 
third aspects of the invention. 

The information on possible errors may be an indication as to the number of 
negative controls which contain a quantity of DNA above me threshold. The information 
on possible errors may be an indication as to me number of contaminated samples which 
contain DNA above the threshold. 

The threshold is preferably a measure of quantity of DNA present It may be 
defined in terms of peak area and/or peak height, particularly in respect of a summed value. 

Preferably samples above or at and above the threshold are subjected to a first 
protocol. Preferably samples at and below or below the threshold are aubjected to a aecond 
protocol, the second protocol may include be a low copy number protocol. 
The quantity of DNA may be peak area and/or peak height and/or summed peak 

20 area and/or summed peak height. 

The simulation may involve simulating each possible pairing of a negative control 

and sample. 

The mixture proportion for one or more of the simulation mixture types: DNA 
from DNA sample only, no DNA from DNA sample or contaminant may also be 
25 established. 

A likelihood ratio maybe determined in respect of one or both of these mixture 

types. 

Separate likelihood ratios may be determined in respect of one or more of me 
different simulation mixture types. 
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A probability of achieving a given likelihood ratio may be determined. Such a 
determination may be made in respect of one or more likelihood ratio levels and/or may be 
made in respect of one or more threshold values. 

The method may include varying the threshold to give a predetermined likelihood 
s ratio and/or predetermined probability of achieving a likelihood ratio. The method may 
include varying the threshold to give a likelihood ratio of >10 3 . 

Preferably likelihood ratios are only determined using negative controls and/or 
samples and/or simulations in which the quantity of DNA is above the threshold in 
question, 

10 Preferably the method 1$ applied independently to different operating 

organisations and/or different processing lines within organisations. Different operating 
organisations and/or different processing lines within organisations may be provided with 
different thresholds as a result of the method. 

The second and/or third aspects of the invention may include any of the features, 

is options or possibilities set out elsewhere in this document 

According to a fourth aspect of die invention we provide a method of operating a 
database containing information on DNA from samples, die method of operating 
including:* 

introducing into die database results from one or more sources; 
20 the operator of the database specifying to die sources that the sources collect 

information according to a method for providing information on DNA samples, that 
method including:- 

in respect of one or more negative controls, obtaining information on whether or 
not DNA is suggested as present in die negative controls; 
25 determining die probability of DNA being suggested as present in die negative 

controls, the determination being based on die number of the negative controls which 
suggest DNA is present compared with Die total number of negative controls considered; 

the probability of DNA being indicated as present in the negative control being 
equated to the probability of the DNA samples being contaminated. 

30 The sources may be one or more operating organisations. 



15/07 2003 14:17 FAX 0113 243 0446 UOL LEEOS ♦ OOCUMENTREC Q018/031 




.12- 

The information on DNA samples may be reviewed by the database operator* The 
database operator may use the information to specify the threshold at which the source uses 
one or more protocols in their analysis. The database operator may specify a threshold 
below which the source needs to use a particular protocol, such as a low copy number 

5 protocol. The database operator may vary the threshold from time to time, particularly 
according to variations in the information obtained The database operator may sporify 
that results from the source may only be introduced onto die database where the threshold 
is applied and/or where the threshold is applied according to the level specified by the 
database operator and/or where the information is collected. 

10 The fourth aspect of the invention may include any of die features, options or 

possibilities set out elsewhere in this document 



According to a fifth aspect of the invention we provide a method of providing 
information on the contamination of DNA samples by persons involved in the processing 
of DNA samples, the method including 
IS determining DNA information of the same type as being analysed for in respect of 

one or more of the persons involved in processing the DNA samples; 

determining the number of samples and/or negative controls contaminated by the 
one or more persons for whom die DNA information has been determined due to the 
detection of DNA information corresponding to their DNA information in samples and/or 
20 negative controls; 

determining the proportion of samples and/or negative contrulw hamHed^by-such — 

persons; 

determining die proportion of persons for Whom die DNA information has been 
determined compared with the total number of persons involved in processing the DNA 
25 samples. 



Preferably the proportion of samples and/or negative controls contaminated is 
divided by die proportion of persons for whom the DNA information has been determined 
to give the total proportion of samples and/or negative controls contaminated by the total 
number of persons involved in the processing of the DNA samples. The method maybe 
30 applied to one or more stages of the overall process. The method may be applied to the 
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overall process* The calculation for the overall process may consider die cumulative effect 
of different stages. This may involve the consideration of different apparent contamination 
rates and different pr op ortions of persons for whom the DN A information has been 
determined for different parts and/or stages of the process. 
S The fifth aspect of the invention may include any of the features, options or 

possibilities set out elsewhere in this document 

According to a sixth aspect of the invention we provide a method of determining 
the threshold to be -used within a method of analysis by an operating organisation to 
determine which analysis protocol to apply, die method including:- 
10 setting a threshold; 

determining the likelihood ratio for false positives for that operating organisation 
with that threshold; 

adjusting the value of the threshold ensure false positives do not exceed a, desired 
likelihood ratio. 

15 The sixth aspect of the invention may include any of the features, options or 

possibilities set out elsewhere in this document 

According to a seventh aspect of the invention we provide a method of analysing 
DNA samples, the method including the analysis of negative controls to provide 
information on the contamination of DNA samples, wherein, in respect of at least one 
20 sample, a negative control arises at the point of the DNA samples collection and is treated 
in the same manner between that point and the conclusion of the analysis method. 

The seventh aspect of the invention may include any of the features, options or 
possibilities set out elsewhere in this document 

The invention will now be described, byway of example Only, and with reference 
25 to the accompanying drawings in which: - 



Figure 1 is an illustration of the potential origin of 



15/0? 2003 14:18 FAX 0113 243 0448 UOL LEEOS ♦ DOCUMENTREC 8018/031 



10 



IS 



20 



25 



30 



-14- 

Ftgure 2 is a plot of a number of occurrences against the height for 
negative controls and for casework samples; 

Figure 3 is a table presenting case samples ranked in order of increasing 
summed peak height with numbers of alleles scored above a given peak height; 

Figure 4 is a table illustrating negative controls ranked in descending 
order of intensity, taken from a population of 295 negative controls; 

Figure 5 is an analysis of a number of observations in respect of varying 

relative mixture forms; 

Figure 6 is a table setting out probability estimates for achieving a given 
likelihood ratio where a laboratory contaminant Is responsible for me major 

(unmixed) profile; and 

Figure 7 is a histogram plot showing probability of a containinant giving 
a xeportableresuK (measured as log 10 1A) relative to the reporting guideline. 

All techniques are subject to potential sources of error. When analysing DNA 
samples to establish the DNA profile, a number of steps are taken to prevent contamination 
by other DNA. 

Attempts are also made to identify instances in which contamination is occurring. 
Such steps include the use of "elimination databases" which contain profile toformation 
relating to the operators involved in the analysis process so as to allow for the identification 
of results in which the operator coiitaminatea the sample. The techniques also include 
detection of potential cross contamination between one sample and another being 
processed concurrently. This may occur where there is lane to lane leakage within the 

analysis process, for instance. 

Sporadic and undetected contamination of samples can still occur and could 
potentially give rise to results which in turn lead to false positives. Thompson et al (2003) 
J. Forensic Sci. 48, 47-54 has recently suggested that false positives can dramatically 
reduce the value of DNA evidence, especially when the priors odds that the suspect is the 
boutcc of an evidence sample are low. Such a situation occurs when a DNA database is 
"trawled" to search for "cold hits", for instance. 

The present invention seeks to account thoroughly for such sporadic and • 
undetected contamination; to provide a dear indication as to the potential level of error that 
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may arise from such contamination; and to provide guidance as to the point at which 
alternative analysis techniques or protocol should be used. The present invention alao 
seeks to provide improved methods of operating DNA databases, particularly in terms of 
additional data which accompanies the DNA profile results reported by organisations to die 
5 DNA database Operator. In particular, the invention seeks to provide a method of 
estimating the number of false positives for a DNA analysis unit and their associated 
likelihood ratios. 

Referring to Figure 1 , the entire process involved in the collection, handling, 
processing and reporting DNA samples is illustrated- Within this overall process, three 
10 discrete categories for the origins of contamination can be identified. 

Firstly, contamination can arise at die crime scene. This may be due to 
contamination by the investigating officers and/or by the reagents and/or equipment they 
use to collect evidence. This probability of contamination is denoted Pa. 

Secondly, when the collected sample is transferred to the evidence recovery unit, 
1 5 ERU, again contamination may arise from die scientists involved and/or the reagents 

and/or equipment they use. The probability of contamination from this part of die process 
is denoted Pb. 

Thirdly, within the DNA analysis unit, there is also potential contamination from 
scientists and/or their reagents and/or equipment The probability of contamination hoe is 
20 denoted Pc. 

As contaminants can pass from crime scene to ERU to the DNA analysis unit, at 
each stage there is an additional opportunity for contamination to occur. As a result the 
contamination process is additive, and the chances of contamination can be summarised as 
equal to Pa + Pb + Pc. 

25 Within DNA analysis units, negative controls are presently used* These are 

samples, generated and analysed within the DNA analysis unit, which are assumed to be 
DNA free. At present when a reportable DNA profile is observed in snch a negative 
control, the batch of samples is eliminated from further consideration due to suspected 
contamination of all, 

30 Recent investigations by the applicant indicate that this is not necessarily an 

appropriate course of action, and may not address die issue, as most, if not all, 
contamination events seen in negative controls are sporadic single tube events. As the 
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comaminBnt is specific to one tube only, this means that it is unlikely that the 
contamination detected for that tube will have any relevance to the associated batch of 
extracted samples being processed. 

The present invention seeks to use negative controls in a fundamentally different 
S way. Instead of using them as Indicative of issues with die batch tbey form a part o£ the 
present invention uses negative controls in relation to the entire DNA process of a DNA 
analysis unit This is possible because negative controls are processed in the same way as 
samples of interest, such as casework samples. Hence they can be used to estimate die 
level of contamination in casework samples over the same period of time. 
10 Whilst the invention Is initially described below in relation to negative controls 

generated in the DNA analysis unit, and hence reflecting the impact of the DNA analysis 
unit on the process, it would be beneficial to use negative controls which reflect the entire 
process and this is recommended. Thus, negative controls could be generated at the crime 
scene, passed through the ERU stage and passed through the DNA analysis unit Negative 
is controls in those circumstances would reflect the potential contamination arising from all 
stages in the process. This might for instance involve the use of a moistened swab to 
collect evidence at a crime scene, with an additional blank swab also being moistened with 
water at the crime scene. Both would then be passed to the ERU In the same way, handled 
and men passed on to toe DNA analysis unit in the same way. Both samples would then be 
20 analysed in the same way within the DNA analysis unit As a result an estimate of Pa+ Pb 
+ Pc would be obtained. The technique is described in more detail below in relation to 
estimating Pc only. 

To obtain the benefits of the invention for a DNA unit it is desirable to obtain a 
significant number of negative controls over a time period for which an assessment is being 
made and also obtain a number of casework DNA profiles from toe same time period. The 
casework DNA profiles maybe a random selection from amongst all the casework profiles 
conducted during the time period. Samples that failed to give any signal should be 
included. Ideally all of the negative controls run during the time period are included, 
including those samples for which no signal is obtained. 
30 la a specific example relating to one analysis unit, 295 negative controls were 

obtained covering a five month period. A random collection of 50 casework DNA profiles 
obtained during the same period was also taken. 
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Out of the SO casework samples analysed, 5 failed to give a result The 
probability of the DNA unit sample failing to give any profile, Pf, therefore =5/50 -0.1. 

Out of the 295 negative controls analysed, a total of 26 samples gave a signal. 
This means that the probability of a negative control giving a profile of one or more alleles, 

5 Pn -26/295 -0.088, 

Contamination is only detected or known to have occurred if it is found in a 
negative control tube purported to be free of DNA. The difficulty is that H is not possible 
to assess directly whether a casework sample is affected by sporadic contamination, as 
there is no supporting information. However, oven though we cannot know which 

10 particular Casework tube is contaminated, wo can assess the probability, P & of any given 
tube being affected, because negative samples are simply a subset of casework samples. 
They are treated in exactly the same way as casework samples within die DNA analysis 
unit and hence are subject to the same contamination rates. The probability of the 
casework sample being contaminated is the same as the probability that a negative control 

IS is contaminated (where the contamination may be 1+ alleles). And so:- 

Pa-Pd-0.088 

If a casework sample is contaminated then this will result in one of two different 
outcomes! 

a. If the casework sample is devoid of DNA then only the contaminant will 
20 be visible and the profile is unmixed. The chance of this occurrence (P s ) is the 

probability of contamination multiplied by the probability of a DNA unit sample 
falling to give a profile:- 

P s „P K xP F 

Specifically, in relation to the actual results for the unit described above, this gives 
25 Pft - 0.088 x 0. 1 - 0.009 (or approximately 0.9% of samples will be contaminated 

and the profile does not appear admixed). 
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b. If the casework sample includes sample DNA and contaminant DNA 
then there is no way of discerning (his from casework samples merely containing 
sample DNA. However, the chance of this occurrence is:- 

Thus, in relation to the actual results of the DNA unit described above, P N - Pg - 
0.088 - 0.0088 - 0.079 (or 7.9%) of casework samples which contain sporadic 
contamination in admixture with e case-work profile. 



This analysis gives important information about the likelihood of sporadic 
contamination occurring for a particular DNA analysis unit. A similar analysis process 

10 applied to different DNA units will potentially reveal different likelihoods depending on 
die procedures, equipment and staff present at that DNA unit His envisaged that because 
DNA databases receive results from a plurality of different DNA analysis units, that each 
DNA analysis unit would be required by die DNA database operator to provide this 
information to allow false positive risk calculations to be performed. This requirement 

15 could extend as fer as DNA analysis units of different operating organisations, or to 
different DNA analysis units whether operated by the same or different operating 
organisations. 

Whilst the above calculations indicate that sporadic contamination will affect 
s ample s on a regular basis, and so provide useful guidance on that issue, they do not give 

airiodieatta n-orthio , ae t^^ 

alleles are all sub-50rfu (random fluorescence units) in terms of me results they produce 
then this is below the level set by the present Low Copy Number, LCN, guidelines. 
Potential techniques for use as an CN analysis protocol are detailed in WO01/79541, the 
contents of which are incorporated horein by reference. As a result an LCN analysis 

25 protocol would be used to ensure that the samples are subjected to a more rigorous 

consideration. In most cases that will ensure they are not reported because die protocol 
followed in such cases is to perform duplicate analysis of the sample, and not to consider 
the results where the results of the two samples are insufficiently similar to one another. It 
is very unlikely that equivalent contamination will occur to both samples in such cases, and 

30 as a result die contamination will not be duplicated between the two samples. 
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To continue the assessment of the impact of this contamination fate it is necessary 
to evaluate negative control and casework data in xnu cb greater detail with special 
emphasis on their relative peak areas and heights. To do this, peak heights of all data are 
combined. The results of this combination and are plotting in relation to the above 
5 mentioned experimental results. This information is illustrated in Figure 2. MATLAB. 
HISTCALC was used to process the data. Note that for bigger data sets a more complete 
analysts could be achieved using separate individual loci. 

The analysis of Figure 2 reveals that the majority, 58%, of the contaminant peak 
heights from the negative controls which reported DNA profiles are <50rfh. As such these 
10 should be handled effectively by existing LCN protocols and so not give felse positive's. 

However, 42% of the negative control samples give peak heights >5Qrfh. These would not 
be subjected to LCN protocols as DNA analysis units presently operated with a 50rfh level 
according to the LCN guidelines. Consequently, there Is overlap between negative control 
samples with DNA profiles reporting, and casework samples with DNA profiles reporting 
15 in the peak height up to approximately lSOrfu range. Consideration of the casework 

information presented in Figure 2 indicates that approximately 17% of alleles are to be 
found reporting in the >50 <1 00, with approximately 70% of casework data having <250 
peak height A significant proportion of negative controls, therefore, fell within the range 
in which the present sub-50rfh threshold would not lead to LCN protocols being applied. 
20 As a consequence the duplicate sample verification would not assist in this area. 

Again whilst this information on the issue is usefkil, the invention can provido 
further benefit Given that in a significant number of instances casework sample only 
DNA, negative control only DNA and DNA from both casework sample and negative 
control could report to the results the potential for a misleading result in the subsequent 
25 analysis stage is now discussed. 

A MATLAB program (NEGSIMPROQ) was used to rank the sum of peak heights 
of weakest -+ strongest casework samples, the table of Figure 3, and strongest -+ weakest 
negative controls, the table of Figure 4, respectively. The worst scenario, in terms of 
potential problems, is where a strong contaminant DNA signal combines witih a weak or 
30 absent casework sample DNA signal. To determine the extent to which such combinations 
occur it is necessary to simulate such occurrences and their impact To do this mixtures 
were simulated by MATLAB JMTXSDVTULATOR using pairwise combinations of casework 
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sample v. negative control (including oil examples where profiles were absent). This 
means that from 50 casework samples and 295 negative controls, pakwisc combinations 
generate 50 x 295 =14750 mixtures. 

The simulated results arising comprise me following types, and form the indicated 

proportion of the 14750 mixtures:- 

Unmixed case samples only (82%) 

• Unmixed contamination (0.9%) 

• A mixture of case sample and contaminant (7.9%) 

• No DNA profile detected (9.1%) 

In respect of the 82% of samples which were casework sample only DNA and the 
9.1% of samples which contain no DNA there is no problem. 

In lespect of the 7.9% in the farm of a mixture of casework sample DNA and 
contaminant DNA, the mixture proportion (Mx) was calculated as Mx - sum peak heights 
contarninant/ sum peak heights casework samples. The distribution of Mx is given in the 
table of Figure 5. Most mixtures gave Mx<l which means that me casework sample was 
the major component In approximately 1 in 500 cases the major component was the 
laboratory contaminant, in the most extreme example Mx-25. However, this analysis 
reveals that the chances of a false result in the case of DNA from sample and 
contamination is very low as only in 0.2% of the cases is the contamination of a high 
enough level to run the risk of a false positive. By far and away the more significant risk 
comes from the 0.9% of cases where only contaminant DNA is present lor course ifie 
investigators do not know this is the source compared with the sample being the source). 

Having obtained this useful indication that a significant number of situations arise 
in which contamination is the only component, the impact of una upon casework reporting 
and In terms of the impact upon any database results are loaded into or used against is 
considered. As cases are reported in terms of likelihood ratios, the 
MATLAB.MKSJMULATOR was used to calculate the likelihood ratios of DNA profiles 
that originated from contamination only. Following the previously mentioned LCN 
guidelines, alleles were not incorporated in calculations unless above the LCN threshold 
which is 50rfa. Ctolyuiuinxedcontamiiiante The 
likelihood ratios are me ratio of probabilities where the numerator is the probability of the 
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evidence if the DNA profile originated from the suspect and the denominator is the 
. probability of die evidence if the DNA profile originated from a random unknown person, 
The chance of a laboratory contaminant resulting in a reportable profile with the indicated 
likelihood ratios is expressed not only with rib - 50 but also with higher rfii thresholds; 
3 below the level a LCN protocol is applied* With an rfu of 50 and an LR >10 7 the chance of 
a laboratory contaminant resulting in a reportable profile was approximately 1 in 1000. 
Higher rfU thresholds had a marked effect on the chance of contaminants resulting in a 
reportable profile with such an LR. Figure 7 provides a graphical representation of the 
results. 

10 The above mentioned investigation reveals the importance of the rfu threshold set 

by the LCN guidelines at which a LCN protocol cuts in being set at the appropriate level 
for a DNA analysis unit fbr any particular period of time. If therfu threshold is at an 
appropriate level then a LCN protocol will be applied to interpret the majority of cases 
which could otherwise give rise to problems. If the LCN threshold is set too low, then a 
IS significant number of cases may be processed outside of a LCN protocol, even though such 
a protocol would be more appropriate so as to address die risk of contamination making a 
meaningful contribution to the DNA profile determined. 

. Just as it is perfectly possible for the technique to establish that different DNA 
analysis units will need a different LCN threshold, it is perfectly possible that the threshold 
20 could change between different time periods for a given DNA analysis unit. Thus periodic 
ieevaluatlon of the threshold is advisable. Changes in procedure, reagent characteristics, 
equipment characteristics and the like could all cause variations with time. Similarly, if the 
negative controls account fbr the crime scene and ERU contaminant contributions as well, 
variations in the applicable threshold may also arise over time. The technique described 
25 has been illustrated with reference to an example used for accounting fbr any 

contamination within the DNA analysis unit It would be possible for differences in 
contamination arising from different crime scene types, different crime scene officers, 
different ERU units and the like to be considered separately, or more preferably, to monitor 
contamination as a whole within the entire sequence of steps. 
30 Id an example of one such quantification of the risk of contamination from one 

part of the operation, the present invention can be used to quantify the chance of 
contamination from scientists involved in the same part of the process as those whom have 
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already submitted their profile to an elimination database. The same part of the proceBa 
could be the DNA analysis unit part of the process. 

To establish the chance of contamination occniring in this part of the process me 
number of instances of contamination from a person included In the elimination database is 
j noted. This number of occasions can be used together with knowledge of the proportion of 
people who arc included in the cmrninatian database to quantify the overall chance. Thus if 
one third of the samples were handled by acienti sts on the elirnination database and 0.0 1 % 
of the overall samples handled were determined to be contaminated by mesa people, the 
overall level of contaminated samples would be 0.03%. 
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Table2: Negative controls ranked In descending order of intensity, taken from a population of 295 
negative controls - only 26 controls thai ^avo a fcigaai arc listed le 275 controls were blank. 
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